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1. INTRODUCTION 

Forecasting is a common analysis of trends that been employed in the process of creating a 
prediction of the future based on past information and present data. Thus, throughout modern life forecasting 
plays a very important role [1]. Various methods for prediction were developed and used in different fields, 
such as the stock market [2-5], electrical load forecasting [6-7], economic forecasting [8-9], medical [10-11] 
and many more. 

Streamflow forecasting is one of the earliest forecasting problems that drew the attention 
of the scientist. As we begin to grasp the value of estimating streamflow for livelihoods around 
the stream since the earliest times. In fact, there was a record of the flow level of the River Nile dated 
around 3000 B.C., therefore the ancient Egyptians annual peak stream level from 3050 B.C. till 2500 B.C. 
also were discovered [12]. This is one of the longest recorded time series of a natural phenomenon. It isn't 
just used to characterize the streamflow issue itself yet in addition as a benchmark time series to study and 
scrutinize completely different forecasting algorithms [13]. In point of fact, improved forecasting accuracy 
can help the operation of the reservoir system. 
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The adoption of the appropriate forecasting techniques in the hydrological forecasting can help to 
predict and ease the system in the future because of forecast accuracy improves. Research has found that 
successful forecasting is dependent on the accurate estimation of the model. Thus, all forecasting techniques 
including qualitative and quantitative approaches are designed to produce an accurate model. As predicting 
the future values was very important for environmental protection and flood control. Therefore, data-driven 
modelling technique has gained popularity in the hydrology field [14]. According to [15], various techniques 
that have had been adopted to predict the streamflow values such as Artificial Neural Networks (ANN), 
Stepwise Multiple Linear Regression model (SWMLR), Zero Order Forecasts (ZOF) and etc. 

Other than that, Regression Analysis including Simple Regression, Multiple Regression, 
and Autoregressive (AR) models, has frequently used to forecast future values of streamflow [16]. 
For instance, [17] demonstrated streamflow forecasting for one month ahead appears more flexible in 
Autoregressive Integrated Moving Average (ARIMA) and Seasonal ARIMA (SARIMA) models. 
It showed that the application of ANN obtained a closed fit during the calibration period and outperformed a 
conventional model of the Winnipeg Flow Forecasting System [18]. 

Using Box-Jenkins variations, AR model was commonly used to forecast annual flows. While [19] 
have also demonstrated that ARMA was able to forecast hydrological data such as river flow. This was due 
to the fact that the data sets used in the study fulfilled the ARMA assumptions which were linear and 
stationary. Be that as it may, time series with a timescale that not exactly a year for the most part gives a 
strong seasonality. For monthly or quarterly-monthly streamflow, it was proposed utilizing SARIMA, 
Deseasonalized ARMA (DARMA) and Periodic ARMA (PARMA) as these models believed able to forecast 
better compare with others. While [20] forecast the monthly inflow of the Dez dam reservoir using ARMA 
and ARIMA by increasing the parameter to four to increase the forecast accuracy. Previous study by [21] 
stated that pattern of runoff forecasting in the United States based on the relative error is SARIMA. 
While SARIMA (0,1,4) (1,0,1)'? model with the maximum R* and minimum Mean Biased Error (MBE) 
was capable of long-term average runoff forecasting for all over the United States. 

Artificial Neural Networks were generally utilized in a wide range of fields, for example, 
digital image [22], fault detection [23], gold price forecasting [24] and many more. It has picked up the 
enthusiasm since rediscovery and popularization of the backpropagation algorithm by Rumelhart and 
McClelland in 1986. It also accepted as a tool for modelling the complex hydrological data since it is 
applicable to deal with complicated problems through pattern recognition methodology [25]. The use of 
ANN approach in water resource problems also has gained more and more popularity due to the complex 
interrelationships that the system may be nonlinear and multivariate [26]. An interesting advantage of ANN 
is that they do not need to provide any clear description of the structure they are predicting [27]. 

Different types of ANN approach have been applied in forecasting the hydrological knowledge [28]. 
For example, [29] forecast the daily discharge of river basin using Adaptive Neuro-Fuzzy Inference Systems 
(ANFIS) and it appears more accurate compared to ANN and Multiple Nonlinear Regression (MNLR). 
Researcher confirms the ability of forecasted 1-day ahead of streamflow using ANN was better 
than predicted with AR model. This approach was said to be a useful tool in solving a specific 
problem in hydrology. While [30] has shown that the Radial Basis Function Neural Network (RBFNN) is 
better in quality and can provide high accuracy and reliability for daily streamflow forecasting, but is less 
commonly used than the Feed-Forward Back-Propagation Neural Network (FFNN) forecasting model. 

Streamflow forecasting using ANN methodology showed a performance that was comparable to the 
traditional or conceptual technique [31]. The preliminary research discussed the utility of ANN to forecast 
hydrological variables over the past year, and the subsequent analysis demonstrated the effectiveness of 
Multilayer Networks with back-propagation training models over conventional statistical techniques. 
Besides, ANN with proper data pre-processing will have a better performance than the ARIMA model [32]. 
Therefore, the main focus of this paper is to compare the accuracy of the forecasting method between ANN 
model and AR in prediction the future values of streamflow. It is expected that this study will contribute to 
this growing area of research by comparing the forecasting models even though the forecasting performance 
of the streamflow might be different between methodologies. 


2. RESEARCH METHOD 
This section will discuss briefly on the case study, reseach methodology and evaluation 
performances, used in the research. 


2.1. Study area 
A daily timescale dataset of streamflow for the Durian Tunggal Reservoir was used in this study. 
The data was collected from 1* Jun 2008 to 31*' May 2014 at the B.11 Air Resam, Melaka monitoring station 
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with the geographic coordinate of 2°21'18.4"N, 102°18'58.4"E as shown in Figure 1. Next, the dataset 
was split into a training period and testing period with a ratio of 80:20. The training period is from 
1* Jun 2008 until 31st May 2012 and for the testing period data from 1* Jun 2012 to 31" May 2014. 
The purpose of the training dataset is to develop the appropriate prediction models and the testing dataset is 
used for model checking. 
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Figure 1. Location of durian tunggal reservoir 


2.2. Box-Jenkin approach 
The AR model commonly used to forecast annual flows. The AR (p) forecasting model defines as: 


VY: =BY s+ yo +--+ 9,% 5 +6, (1) 


where y, is the predicted value in the period t in the term of the first p data in time series; @, is coefficients 


associated with each previously observed value and &, is a normal white noise process with zero mean and 


variance. In this study, AR models were fitted to the streamflow data. 


2.3. ANN approach 

Artificial Neural Network is a nonlinear application that used to pre-processing the inputs 
and outputs. Most of the suggested neural network structure for modelling is MLP which is a feed 
forward network. Usually, the structure consisting of three connected layers of neurons. The number of 
neurons in the input and output layer is specified by the problems to which the network is applied. 
The layers are composed of input as the input data, hidden layer as data processing and output as: 


y= of WX; +6) =9(w' +b) (2) 
i=l 


where w is the vector of weights, x is a vector of inputs, b is biased, and @ is non-linear activation function. 
Before applying ANN, [33] suggested to rescale the data in the range of [0,1] by using 
min-max normalization. 


2.4. Evaluation of model performance 
A prediction error is calculated in order to evaluate the adequacy of each model in terms of how 
well the model forecast. Therefore, five types of measurement error are used: 


MAE = 2=ali-Pl (3) 
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where Y; is the ith actual observation for the constituent being evaluated, Y; is the ith simulated value for the 
constituent being evaluated, Y is the mean of the observed data for the constituent being evaluated, and n is 
the total number of observations. The lower the MAE, RMSE, MAPE, and MFE and the bigger the CE the 
higher the accuracy of the forecast. 


3. RESULTS AND DISCUSSION 

Figure 2 shows the data from 1* June 2008 until 31*t May 2014 were split into training and testing 
datasets. The changes in the volume of Durian Tunggal streamflow were plot against time to identify the 
pattern. However, there is no obvious pattern existed such as trend, cyclic or seasonal pattern. 


Time Series Plot of Durian Tunggal Stream Flow 
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Figure 2. Time series plot of durian tunggal streamflow 


The Box-Jenkins model is applied to stimulate Durian Tunggal streamflow. Since the time 
series is stable, AR models are determined. From the training model, six alternative models were selected. 
Alternative model AR (3) with the smallest AIC, BIC values and AR (6) with the largest correlation 
was chosen as a possible tentative model for forecasting. Next, the Box-Jenkins model procedure 
resulted in 6 alternative AR models. The inputs of the forecasting measurement for alternative AR models are 
stated in Table 1. The error measurement of these alternative AR models is compared to each other. 
Hence, MAE, RMSE, MAPE, MFE, and CE were used to measure the error. Based on the error measurement 
of alternative testing AR models, it is concluded that AR (4) is the best model. 
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Table 1. Input of Alternative AR models for Streamflow 


Alternative Models MAE RMSE MAPE CE 
AR (1) 0.3430 0.8167 72.8744 0.2212 
AR (2) 0.3431 0.8167 72.4551 0.2212 
AR (3) 0.3442 0.8117 70.1890 0.2306 
AR (4) 0.3449 0.8094 69.6877 0.2351 
AR (5) 0.3515 0.8163 71.5841 0.2219 
AR (6) 0.3509 0.8118 71.1823 0.2304 


Note: the black bold are the best values and the red bold are chosen to represent AR model 


Next, the datasets of Durian Tunggal streamflow are tested by using ANN _ model. 
By following ANN procedure, the datasets were normalized in the range of [0,1] with the Min-Max 
Normalization formula. Several alternative models are obtained to determine the best forecasting model. 
The ANN that has been used is Multilayer perceptron (MLP) which has input, hidden layer and output. 
Therefore, the ANN model will capture and determine all the information of the data by moving 
forth and back between the layers. The data lagging technique also being applied in this model 
to compliment the daily data. Hence, seven models are built with ten different hidden neuron numbers 
and 10 times cross-validation. The desired model is determined based on the smallest forecasting 
performance error. 

For further analysis, a comparison of the error measurement of the Box-Jenkins model 
and ANN model were conducted. From the input data result of the error measurement of AR and ANN 
models in Table 2, it compares the model that has a better forecast performance based on MAE, RMSE, 
MAPE, MFE, and CE. 


Table 2. Comparison of error performance measurement between models 


Model MAE RMSE MAPE CE MFE 

AR (4) 0.3449 0.8094 69.6877 23.51 0.0375 
1-5-1 0.0596 0.3168 9.3351 88.28 0.0225 

2-3-1 0.1479 0.3362 40.2021 86.80 0.0266 

3-3-1 0.0519 0.1843 10.7293 96.03 0.0257 

ANN 4-4-1 0.0209 0.1945 2.2507 95.58 0.0066 
5-5-1 0.0155 0.1180 2.2949 98.38 0.0070 

6-5-1 0.0133 0.0723 2.6624 99.39 0.0080 

7-7-1 0.0116 0.0607 1.8214 99.57 0.0058 


The final finding shows that ANN (7-7-1) model has the smallest values of MAE (0.0116 m*/s), 
RMSE (0.0607 m3/s), MAPE (1.8214% m/s), MFE (0.0058 m?/s) and largest CE (99.57% m?°/s) among 
the others. Therefore, model ANN (7-7-1) is declared as the best model followed by ANN (4-4-1) 
and ANN (5-5-1). These models can be defined as the second and third best approaches and have 
outperformed AR models. 

Figure 3 (a - c) illustrates the 1:1 plot line to examine the agreement level between the values of the 
chosen models. Based on the figures, it can be observed that ANN models able to predict accurately with 
slight over-forecast as the data points were located slightly above the equal 1:1 line. Overall, ANN has over- 
forecast the Durian Tunggal streamflow values with 0.58%, 0.66% and 0.7% for ANN architectures of 7-7-1, 
4-4-1 and 5-5-1, respectively. Meanwhile, Figure 4 (a-c) shows a comparison of the observed and the 
predicted values. Most of the predicted data points in Figure 4(a-c) are close to the observed datapoints, 
indicationg superb prediction capabilities. 
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ANN (7-7-1) Model 
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Figure 3c. The 1:1 plot for ANN 5-5-1 model 


Forecasting accuracy: a comparative study between... (Wan Nur Hawa Fatihah Wan Zurey) 


470 


4. 


Variable 


9 Observed 
| Predicted 
8 i} 
7 i 
| 
z 6 | 
z 1 | 
5 | 
E Ht | 
a Wh ij ; ow 
3 ul i; tl | 
il bhi | 
| sits it ih dat { 
ome eT i ee 
| » UW bYM vib 
: th ti iil ha dl My f i va fh | ! i i 
A wy J MAPA RS UA Mail ta Wf bake 
0 Ra natin AYA Vy AL. ve ALL wh 
Day 1 a 1 1 1 1 
Month Jun Oct Feb Jun Oct Feb 
Year 2012 2013 2014 
Figure 4a. Hydrograph plot for ANN 7-7-1 model 
101 Variable 
——— Observed 
— — Predicted 
ad 
aes 
ira 
nn 
24 
my 4 i i i i 1 
Month Jun Oct Feb Jun Oct Feb 
Year 2012 2013 2014 
Figure 4b. Hydrograph plot for ANN 4-4-1 model 
| Variable 
9 | ——— Observed 
— — Predicted 
8 
7 
z 6 
ira 5 
5 
‘if 4 
a 3 
2 
1 
oO 
Day i 1 1 1 i 1 
Month Jun Oct Feb Jun Oct Feb 
Year 2012 2013 2014 
Figure 4c. Hydrograph plot for ANN 5-5-1 model 


CONCLUSION 
The purpose of this study was to use ANN and Box-Jenkins to develop two different models. 
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The model was then compared against each other. In this analysis, ANN model was presented even with 
difficulties throughout deciding the appropriate model input, hidden layer and learning rate. Thus, the result 
shows that with lower error performance, the ANN model performs better than AR approach in forecasting 
streamflow and it confirms the ability of this approach in providing a useful model in solving stream flow 
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problems with nonlinear and stationary datasets. Although the findings of the studies are very encouraging 
and may help to improve decision-making in the development of flood protection systems or water resource 
initiatives, the theoretical models still need to be improved before they can be implemented. 
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